Skip to content

feat(key on repo url): support git hosts other than GitHub + multiple forks #1043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 92 commits into
base: main
Choose a base branch
from

Conversation

kriswest
Copy link
Contributor

@kriswest kriswest commented Jun 3, 2025

resolves #950
resolves #511
resolves #66
resolves #1107
resolves #1028

Refactor (api, proxy & UI) to remove the assumption of GitHub as the git repository host and the use of the repository name field as the id of the repository (as this prevents git-proxy instances from supporting multiple forks of a project or projects from multiple hosts with the same name).

This PR:

  • Replaces the use of the repo name field in the API with the _id field generated by the database adaptors,
    • Using the repository URL as a key does not work well with express routing, but _id does in both mongo and neDb
    • allows names to be repeated (multiple forks or clashing names from different organisations/repository hosts)
    • UI and CLI were updated accordingly
  • Replaces the use of organisation/repoName.git in the proxy URLs with the repository url
  • Disables GitHub specific functionality in the UI if the host is not Github
  • Completes application of Typescript to the database classes
    • Duplicated code reduced
    • A number of minor differences in behaviour (particularly return types) between the DB adaptors were resolved
    • Does NOT refactor all usages of the DB client to use typescript (still many requires to eliminate)
  • Deprecates and ignores the config property proxyUrl as the proxied host(s) are now determined from the configured repositories
  • Expands the tests for proxy routes and the Repo route of the API

To Do:

  • Annotate PR for review
  • Check test coverage
  • Implement additional tests for the proxy and fallback
    • implement tests for new proxy URLs for github.com
    • implement tests for fallback with legacy proxy urls for github.com
    • implement tests for gitlab.com
    • implement tests for non-github/non-gitlab repo
    • implement tests for multiple forks
  • Add support for GitLab API where repo is hosted at GitLab

(contributed as part of a GitLab CoCreate collaboration with help from @StingRayZA)

Copy link

netlify bot commented Jun 3, 2025

Deploy Preview for endearing-brigadeiros-63f9d0 canceled.

Name Link
🔨 Latest commit 256ef3f
🔍 Latest deploy log https://app.netlify.com/projects/endearing-brigadeiros-63f9d0/deploys/689633b6585de700087a7230

Copy link

codecov bot commented Jun 3, 2025

Codecov Report

❌ Patch coverage is 83.72093% with 98 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.77%. Comparing base (ddff723) to head (256ef3f).

Files with missing lines Patch % Lines
src/db/mongo/repo.ts 37.14% 22 Missing ⚠️
src/service/routes/repo.js 85.04% 16 Missing ⚠️
src/db/mongo/users.ts 37.50% 15 Missing ⚠️
src/db/file/users.ts 65.21% 8 Missing ⚠️
src/proxy/index.ts 80.95% 8 Missing ⚠️
src/db/file/repo.ts 86.36% 3 Missing and 3 partials ⚠️
src/db/index.ts 93.40% 4 Missing and 2 partials ⚠️
src/proxy/routes/index.ts 92.94% 6 Missing ⚠️
src/db/mongo/pushes.ts 60.00% 4 Missing ⚠️
src/proxy/routes/helper.ts 94.33% 3 Missing ⚠️
... and 3 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1043      +/-   ##
==========================================
- Coverage   83.25%   82.77%   -0.49%     
==========================================
  Files          59       66       +7     
  Lines        2449     2787     +338     
  Branches      280      335      +55     
==========================================
+ Hits         2039     2307     +268     
- Misses        366      433      +67     
- Partials       44       47       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@sam-holmes2 sam-holmes2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after an initial scan through :) thanks for your contribution!

@kriswest
Copy link
Contributor Author

kriswest commented Jun 5, 2025

Picked up a couple of test failures after merging main - will resolve (and start working on the additional tests needed).

Copy link
Contributor

@jescalada jescalada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the approval/rejection flows with a pre-existing repo, and things work well!

There is an issue with backwards compatibility with older, invalid databases from previous versions of GitProxy (unique URL enforcement with repos). This may also cause issues with the other files (pushes, users).

I also tested the Add Repo flow which caused my server to crash, maybe because of something wrong on my end (invalid input maybe?).

@kriswest

This comment was marked as resolved.

@jescalada
Copy link
Contributor

I can release the unique constraint on the index to avoid this - however, I put it in to catch invalid data as, like the use of the repository name as an ID, it will result in the selection of the wrong repo project at times (although in this case the multiple records would have to be the same repo) and I thought it better for tests to fail etc. if data wasn't cleaned up from a previous run. Where do you think we should go with this @jescalada - an automated migration seems difficult as you'd just have to delete or programmatically edit one of the duplicate records...

I think catching and displaying a simple error message with the invalid entry/entries could be enough - so that the GitProxy administrator can quickly identify the issue and fix it manually. Thankfully, the error seems to occur on backend (db) startup, so end users wouldn't really have the app suddenly blowing up.

kriswest and others added 20 commits July 3, 2025 10:17
Typescript wasn't working on the DB classes due to their dependency imports with require.
@kriswest
Copy link
Contributor Author

kriswest commented Aug 5, 2025

@andypols in the earlier iterations of this PR I was using the proxy middlewares directly in the Router and the only good way to add/remove them again was to restart. I ended up changing that to wrap them in a custom middleware which uses the proxy middlewares in the proxies object. That approach can more easily support update without a restart, but it'd take a refactor of the proxy router to support updating (probably wrapping it in a class, or just merge it into the Proxy class) and a new function on the Proxy class to trigger the update (to replace the stop/start calls made in the api.

I'm loathe to take that on before we merge the current state, as I think restarting for a new origin is a low (but not no!) impact flaw and we could deal with it in a subsequent issue/PR.

@andypols
Copy link
Contributor

andypols commented Aug 5, 2025

@kriswest

I'm loathe to take that on before we merge the current state, as I think restarting for a new origin is a low (but not no!) impact flaw and we could deal with it in a subsequent issue/PR.

Fair enough — I can see that. It was just a question, not a criticism. From my (possibly overly simplistic) perspective, the proxy shouldn't care what the Git domain is. Happy to see it handled in a subsequent PR!

@kriswest
Copy link
Contributor Author

kriswest commented Aug 5, 2025

@andypols

It was just a question, not a criticism

No problem! You've just got me thinking (out loud).

From my (possibly overly simplistic) perspective, the proxy shouldn't care what the Git domain is.

One thing that lead to it is the express-http-proxy middleware taking the proxied domain as an argument - it will accept a function, but those can only rotate which host requests are proxied to as the function doesn't have the request itself in scope). I guess another approach would be to create the proxy middleware in response to each request; I didn't do that on the (untested!) assumption that it would could cause a performance bottleneck (e.g. by checking the proxied hosts in the DB on each request). However, that's likely a small penalty...

@kyet
Copy link

kyet commented Aug 7, 2025

@kriswest Thank you very much for your help. Although the issue hasn't been resolved yet, I've taken a look at some of the suggestions you provided.

To debug I would be tempted to log the request in chain.ts, before and after parsing by parseAction and see what you get. ..

I added a console.log() to parseAction, but the request doesn't have a body (no req.body !! log printed). I'm not sure what part of the log I should pay close attention to. I'm struggling to debug due to my limited knowledge and skills.

diff --git a/src/proxy/chain.ts b/src/proxy/chain.ts
index 2130519c..94446568 100644
--- a/src/proxy/chain.ts
+++ b/src/proxy/chain.ts
@@ -32,6 +32,9 @@ let pluginsInserted = false;
 export const executeChain = async (req: any, res: any): Promise<Action> => {
   let action: Action = {} as Action;
   try {
+    console.log(req);
+    if (!req.body)
+        console.log("no req.body !!");
     action = await proc.pre.parseAction(req);
     const actionFns = await getChain(action);

@@ -54,6 +57,7 @@ export const executeChain = async (req: any, res: any): Promise<Action> => {
     }
   }

+  console.log(action);
   return action;
 };
  • 19.2 against a github repository with the same client and similar git config - is the issue specific to this PR?

Unfortunately, our company's firewall policy blocks push to github.com. I'll try testing from home later.

  • a different gitlab project clone, ensuring that there are changes to push - is the issue specific to the repo/checkout?

This issue also occurs in other projects and repositories.

P.S. There is a second PR that you may need #973 - relating to linking PRs to authors via email. I doubt that is involved in the issue, ..

I tried to merge PR 973 on this PR, but a conflict occurred. I'll likely need to try again after this PR and 973 are merged.

@kriswest
Copy link
Contributor Author

kriswest commented Aug 7, 2025

Hi @kyet, hopefully 973 will be through review and merged shortly. I'll update this shortly after.

However, I can't think of anything (in the PR) that would result in in req.body being undefined (I'm assuming its undefined rather than null), but there are reasons that it can be including the client not sending a body (or something in between you and the server removing it). Are you testing on localhost, if so we can discount something on your network.

Happy to look at the request and headers in case it tells us something.

@kriswest
Copy link
Contributor Author

kriswest commented Aug 7, 2025

Conflicts resolved and ready for a another look. I haven't had a chance to test it yet, but the tests are all passing.

@kriswest
Copy link
Contributor Author

kriswest commented Aug 7, 2025

I'm aware I haven't done anything in the documentation regarding this PR. That should probably be reviewed and work to add to the docs undertaken under a new issue.

Copy link
Contributor

@jescalada jescalada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I have a few comments - hope we can tackle these and get this PR ready to merge soon. 🚀

A few more things I was wondering:

What exactly are the "breaking" changes, and what are the steps an organization must follow to upgrade GitProxy to v2? I have a feeling that some of the issues I encountered might have been due to "bad data" - something that could be updated with a script to avoid errors in v2.

So two important action points for the v2 release:

  • Documenting the breaking changes for both #973 and this PR
  • Ideally automating the migration process for v1 -> v2 databases so GitProxy administrators don't need to do anything (and potentially mess up the upgrade process)
    • If automating is not plausible, we should document likely problems and their solutions (for example: normalizing .git ending on repo URLs to prevent frontend display bugs)

.set('accept', 'application/x-git-upload-pack-request')
.buffer();

res2.should.have.status(404);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is failing on my local setup:

  1) proxy express application
       should be restarted by the api and stop proxying requests for a host (e.g. gitlab.com) when the last project at that host is DELETED via the API:

      AssertionError: expected Response{ domain: null, …(34), …(2) } to have status code 404 but got 200
      + expected - actual

      -200
      +404
      
      at Context.<anonymous> (test/testProxyRoute.test.js:452:22)
      at Generator.next (<anonymous>)
      at fulfilled (test/testProxyRoute.test.js:5:58)
      at processTicksAndRejections (node:internal/process/task_queues:105:5)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a gitlab repo in your data already? The test is assuming you don't. Does the failure persist if you delete the .data dir for the fileDB and re-run (and does it come back if you re-run)?

I'm able to run multiple times with the same data dir and do not get that failure.

... however, I note some oddities to the logging:

Restarting the proxy to remove a host
No plugins configured
Found 0 plugin modules
HTTP server closed
Initializing proxy router for origins: '["github.com"]'
        setting up origin: 'github.com'
        setting up catch-all route (github.com) for backwards compatibility
proxy keys registered:  ["/github.com/"]
HTTP Proxy Listening on 8000
processing request URL: '/gitlab.com/gitlab-community/meta.git/info/refs?service=git-upload-pack'
proxy keys registered:  ["/github.com/"]
        using fallback
Action processed: Allowed
    Request URL: /gitlab.com/gitlab-community/meta.git/info/refs?service=git-upload-pack
    Host:        127.0.0.1:58153
    User-Agent:  git/2.42.0
Request resolved to https://github.com/gitlab.com/gitlab-community/meta.git/info/refs?service=git-upload-pack
    ✔ should be restarted by the api and stop proxying requests for a host (e.g. gitlab.com) when the last project at that host is DELETED via the API (694ms)

Looks like the request was allowed by the fallback. It should have been blocked. I think the 404 must be coming from GitHub and making the test pass my end. Not sure why it fails on yours, but clearly it needs a look.

Copy link
Contributor

@jescalada jescalada Aug 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the GitLab repo fixed this as well!

Might be nice to have before and after functions to prevent these - some developers might assume there's a problem in their code or the tests themselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but I'm in two minds as to whether to we should clean up other repos on that domain or just fail out earlier and make it clear what the problem is... Possibly a case that runs first with a clear name like "Check that are no existing Gitlab repos in the database"?. Alternatively just prefix the relevant tests with an expect on a clearly named variable?

I can do either today hopefully, I just have to decide which way to go!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at these in more detail, I feel like the simplest choice is to make the reason for failure clear, since most contributors won't be testing with a pre-loaded database to begin with...


it('Proxy route helpers should return the proxied origin', async function () {
const origins = await getAllProxiedHosts();
expect(origins).to.eql([TEST_REPO.host]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is also failing on my local setup - perhaps due to missing test data? Or is it my local data that's causing the problem?

 2) add new repo
       Proxy route helpers should return the proxied origin:

      AssertionError: expected [ 'gitlab.com', 'github.com' ] to deeply equal [ 'github.com' ]
      + expected - actual

       [
      -  "gitlab.com"
         "github.com"
       ]
      
      at Context.<anonymous> (test/testRepoApi.test.js:266:24)
      at Generator.next (<anonymous>)
      at fulfilled (test/testRepoApi.test.js:5:58)
      at processTicksAndRejections (node:internal/process/task_queues:105:5)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thats likely some existing repo pointing to gitlab.com. Try deleting you .data folder and re-running a few times.

Perhaps I should add an expect at the start of the test for there to be no repos with that origin?

Copy link
Contributor

@jescalada jescalada Aug 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the extra hosts and it worked 👍🏼

Perhaps I should add an expect at the start of the test for there to be no repos with that origin?

This would be helpful in case people try to run the tests locally!

@jescalada
Copy link
Contributor

Looks good! Just a few comments on the failing tests due to pre-existing GitLab origin (as well as other origins). It'd be fantastic if we could fix those to be agnostic of the data in the database.

@kriswest
Copy link
Contributor Author

Making the tests agnostic of the data would be good, but I haven't figured out how yet - the requests that go out are affected the response from the other end, so we'd probably need to mock something in the proxy, perhaps the URL it would forward a request on to, and then check it had been called and returned the right value?

@jescalada
Copy link
Contributor

@kriswest I've taken another look at the tests, and since the failing ones are "end to end" if anything, it'd be harder than I thought to mock out the database dependency (and it only really makes sense for unit/function tests).

As long as the reason for failure is obvious for contributors, that should be enough for now so we can speed up the release. We can make the tests more robust in another issue (#978 and #1143 are related to this).

@kyet
Copy link

kyet commented Aug 14, 2025

Hi, @kriswest. I found culprit of the No body found problem!

I dumped the packets, I found that my git client was actually sending the contents.

The pattern matching fails in the code below, and therefore the req.body is not generated.

proxy/routes/index.ts

const isPackPost = (req: Request) =>
  req.method === 'POST' &&
  // eslint-disable-next-line no-useless-escape
  /^\/[^\/]+\/[^\/]+\.git\/(?:git-upload-pack|git-receive-pack)$/.test(req.url);

const teeAndValidate = async (req: Request, res: Response, next: NextFunction) => {
  if (!isPackPost(req)) return next();
  ..
	try {
	  ..
		(req as any).body = buf;

In my case, req.url is /git.mygitlab.com/my.name/git-proxy.git/git-receive-pack (sanitized real name).

Upon reviewing the regex, I noticed that there is a change history in the commit below.

29e7d2a

You may need to revert or modify the regex changes.

I would like to share one more thing related to this issue. As I mentioned earlier, our company has a firewall issue that prevents us from pushing to github.com, so I tested it at home. The test environment is different, but when I set up a github.com proxy like this, proxy fails, but proxy2 succeeds.

[remote "proxy"]
        url = http://localhost:8000/github.com/kyet/git-proxy.git
        fetch = +refs/heads/*:refs/remotes/proxy/*
[remote "proxy2"]
        url = http://localhost:8000/kyet/git-proxy.git
        fetch = +refs/heads/*:refs/remotes/proxy2/*

If I think about it in relation to the above issue, when falling back to the default proxy, I assume that req.url probably doesn't contain github.com (e.g., /kyet/git-proxy.git/git-receive-pack)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
7 participants